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Abstract 

Estimation of small failure probabilities is one of the most important and chal- 
lenging computational problems in reliability engineering. The failure probability 
is usually given by an integral over a high-dimensional uncertain parameter space 
that is difficult to evaluate numerically. This paper focuses on enhancements to 
Subset Simulation (SS), proposed by Au and Beck, which provides an efficient al- 
gorithm based on MCMC (Markov chain Monte Carlo) simulation for computing 
small failure probabilities for general high-dimensional reliability problems. First, 
we analyze the Modified Metropolis algorithm (MMA), an MCMC technique, which 
is used in SS for sampling from high-dimensional conditional distributions. The effi- 
ciency and accuracy of SS directly depends on the ergodic properties of the Markov 
chains generated by MMA, which control how fast the chain explores the parameter 
space. We present some observations on the optimal scaling of MMA for efficient 
exploration, and develop an optimal scaling strategy for this algorithm when it is 
employed within SS. Next, we provide a theoretical basis for the optimal value of 
the conditional failure probability po, an important parameter one has to choose 
when using SS. We demonstrate that choosing any po G [0.1,0.3] will give similar 
efficiency as the optimal value of pq. Finally, a Bayesian post-processor SS-|- for the 
original SS method is developed where the uncertain failure probability that one is 
estimating is modeled as a stochastic variable whose possible values belong to the 
unit interval. Simulated samples from SS are viewed as informative data relevant 
to the system's reliability. Instead of a single real number as an estimate, SS-|- pro- 
duces the posterior PDF of the failure probability, which takes into account both 
prior information and the information in the sampled data. This PDF quantifies 
the uncertainty in the value of the failure probability and it may be further used 
in risk analyses to incorporate this uncertainty. To demonstrate SS-I-, we consider 
its application to two different reliability problems: a linear reliability problem and 
reliability analysis of an elasto-plastic structure subjected to strong seismic ground 
motion. The relationship between the original SS and SS-|- is also discussed. 
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1 Introduction 



One of the most important and challenging problems in reliability engineering is to es- 
timate the failure probability pp for a system, that is, the probability of unacceptable 
system performance. This is usually expressed as an integral over a high- dimensional 
uncertain parameter space: 



where 9 E M"^ represents the uncertain parameters needed to specify completely the ex- 
citation and dynamic model of the system; 7i{9) is the joint probability density function 
(PDF) for 6] F gW^ is the failure domain in the parameter space (i.e. the set of parame- 
ter values that lead to performance of the system that is considered to be unacceptable); 
and If{0) stands for the indicator function, i.e. If{0) = I if E F and If{0) = if 9 ^ F. 
The dimension d is typically large for dynamic reliability problems (e.g. d ~ 10^) because 
the stochastic input time history is discretized in time. As a result, the usual numerical 
quadrature methods for integrals are not computationally feasible for evaluation ([1]). 

Over the past decade, the engineering research community has realized the importance 
of advanced stochastic simulation methods for reliability analysis. As a result, many 
different efficient algorithms have been developed recently, e.g. Subset Simulation [2], 
Importance Sampling using Elementary Events [3], Line Sampling [2B], Auxiliary domain 
method [25], Spherical Subset Simulation [J^, Horseracing Simulation |13], to name but 
a few. 

This paper focuses on enhancements to Subset Simulation (SS), proposed by Au and 
Beck in [2], which provides an efficient algorithm for computing failure probabilities for 
general high-dimensional reliability problems. It has been shown theoretically [2] and 
verified with different numerical examples (e.g. [U [5l |39]) that SS gives much higher 
computational efficiency than standard Monte Carlo Simulation when estimating small 
failure probabilities. Recently, various modifications of SS have been proposed: SS with 
splitting [9], Hybrid SS [10], and Two-Stage SS [23]. It is important to highlight, however, 
that none of these modifications offer a drastic improvement over the original algorithm. 

We start with the analysis of the Modified Metropolis algorithm (MMA), a Markov 
chain Monte Carlo technique used in SS, which is presented in Section [2l The efficiency 
and accuracy of SS directly depends on the ergodic properties of the Markov chains 
generated by MMA. In Section |3l we examine the optimal scaling of MMA to tune the 
parameters of the algorithm to make the resulting Markov chain converge to stationarity 
as fast as possible. We present a collection of observations on the optimal scaling of MMA 
for different numerical examples, and develop an optimal scaling strategy for MMA when 
it is employed within SS for estimating small failure probabilities. 

One of the most important components of SS which affects its efficiency is the choice 
of the sequence of intermediate threshold values or, equivalently, the intermediate failure 
probabilities (see Section |21 where the original SS algorithm is described). In Section HJ 
a method for optimally choosing these probabilities is presented. 

The usual interpretation of Monte Carlo methods is consistent with a purely frequen- 
tist approach, meaning that they can be interpreted in terms of the frequentist definition 
of probability which identifies it with the long-run relative frequency of occurrence of 
an event. An alternative interpretation can be made based on the Bayesian approach 
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which views probabihty as a measure of the plausibihty of a proposition conditional on 
incomplete information that does not allow us to establish the truth or falsehood of the 
proposition with certainty. Bayesian probability theory was, in fact, primarily developed 
by the mathematician and astronomer Laplace [29l [30] for statistical analysis of astro- 
nomical observations. Moreover, Laplace developed the well-known Bayes' theorem in full 
generality, while Bayes did it only for a special case [6]. A complete development based 
on Laplace's theory, with numerous examples of its applications, was given by the mathe- 
matician and geophysicist Jeffreys [22] in the early 20*^ century. Despite its usefulness in 
applications, the work of Laplace and Jeffreys on probability was rejected in favor of the 
frequentist approach by most statisticians until late last century. Because of the absence 
of a strong rationale behind the theory at that time, it was perceived as subjective and 
not rigorous by many statisticians. A rigorous logic foundation for the Bayesian approach 
was given in the seminal work of the physicist Cox [TTl [12] and expounded by the physi- 
cist Jaynes [201 121] , enhancing Bayesian probability theory as a convenient mathematical 
language for inference and uncertainty quantification. Although the Bayesian approach 
usually leads to high-dimensional integrals that often cannot be evaluated analytically 
nor numerically by straightforward quadrature, the development of Markov chain Monte 
Carlo algorithms and increasing computing power have led over the past few decades to 
an explosive growth of Bayesian papers in all research disciplines. 

In Section [5] of this paper, a Bayesian post-processor for the original Subset Simula- 
tion method is developed, where the uncertain failure probability that one is estimating 
is modeled as a stochastic variable whose possible values belong to the unit interval. Al- 
though this failure probability is a constant defined by the integral in ([1]), its exact value 
is unknown because the integral cannot be evaluated; instead, we must infer its value 
from available relevant information. Instead of a single real number as an estimate, the 
post-processor, written as SS-f ("SS-plus") for short, produces the posterior PDF of the 
failure probability, which takes into account both relevant prior information and the in- 
formation from the samples generated by SS. This PDF expresses the relative plausibility 
of each possible value of the failure probability based on this information. Since this PDF 
quantifies the uncertainty in the value of pp, it can be fully used in risk analyses (e.g. for 
life-cycle cost analysis, decision making under risk, etc.), or it can be used to give a point 
estimate such as the most probable value based on the available information. 

2 Subset Simulation 

2.1 Basic idea of Subset Simulation 

The original and best known stochastic simulation algorithm for estimating high-dimensional 
integrals is Monte Carlo Simulation (MCS). In this method the failure probability pp is 
estimated by approximating the mean of If{0) in & by its sample mean: 




(2) 
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system failure according to a model of the system dynamics. Notice that each evaluation 
of Ip requires a deterministic system analysis to be performed to check whether the sample 
implies failure. The main advantage of MCS is that its efficiency does not depend on the 
dimension d of the parameter space. Indeed, straightforward calculation shows that the 
coefficient of variation (c.o.v) of the Monte Carlo estimate (E]), serving as a measure of 
accuracy in the usual interpretation of MCS, is given by: 

However, MCS has a serious drawback: it is inefficient in estimating small failure proba- 
bilities. If Pf is very small, pp <^ 1, then it follows from ^ that the number of samples 

(or, equivalently, number of system analyses) needed to achieve an acceptable level of 
accuracy is very large, oc 1/pf ^ 1- This deficiency of MCS has motivated research 
to develop more efficient stochastic simulation algorithms for estimating small failure 
probabilities in high-dimensions. 

The basic idea of Subset Simulation [2] is the following: represent a very small failure 
probability pp as a product of larger probabilities so pp = HJliPi' where the factors pj 
are estimated sequentially, pj ~ pj, to obtain an estimate pp^ for pp as p^p^ = YYjLi Pj- To 
reach this goal, let us consider a decreasing sequence of nested subsets of the parameter 
space, starting from the entire space and shrinking to the failure domain F: 

= FoD FiD ...D D F^ = F. (4) 

Subsets Fi, . . . , Fm~i are called intermediate failure domains. As a result, the failure 
probability pp = P{F) can be rewritten as a product of conditional probabilities: 

m m 

PF = l[P{F,\F,.^) = l[p„ (5) 

j=i j=i 

where pj = P{Fj\Fj_i) is the conditional probability at the (j — 1)*'^ conditional level. 
Clearly, by choosing the intermediate failure domains appropriately, all conditional prob- 
abilities Pj can be made large. Furthermore, they can be estimated, in principle, by the 
fraction of independent conditional samples that cause failure at the intermediate level: 

i=l 

Hence, the original problem (estimation of the small failure probability pp) is replaced by 
a sequence of m intermediate problems (estimation of the larger failure probabilities pj, 
j = l,...,m). 

The first probability pi = P{Fi\Fq) = P{Fi) is straightforward to estimate by MCS, 
since requires sampling from 7r(-) that is assumed to be readily sampled. However, if 
j > 2, to estimate pj using one needs to generate independent samples from conditional 
distribution 7r(-|Fj„i), which, in general, is not a trivial task. It is not efficient to use MCS 
for this purpose, especially at higher levels, but it can be done by a specifically tailored 
Markov chain Monte Carlo technique at the expense of generating dependent samples. 

Markov chain Monte Carlo (MCMC) [211 [Ml [SS] is a class of algorithms for sampling 
from multi-dimensional target probability distributions that cannot be directly sampled. 
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at least not efficiently. These methods are based on constructing a Markov chain that 
has the distribution of interest as its stationary distribution. By simulating samples from 
the Markov chain, they will eventually be draws from the target probability distribution 
but they will not be independent samples. In Subset Simulation, the Modified Metropolis 
algorithm (MMA) [2], an MCMC technique based on the original Metropolis algorithm 
[331 HE], is used for sampling from the conditional distributions 7r(-|Fj_i). 
Remark 1. It was observed in [2] that the original Metropolis algorithm does not work 
in high- dimensional conditional probability spaces, because it produces a Markov chain 
with very highly correlated states. The geometrical reasons for this are discussed in [26] . 



2.2 Modified Metropolis algorithm 

Suppose we want to generate a Markov chain with stationary distribution 

= -F(Fr ^ — m — ' ^ ^ 

where F C M*^ is a subset of the parameter space. Without significant loss of generality, 
we assume here that 7i{9) = 11^=1 ^^(^fc)) i-^- components of 9 are independent (but 
are not so when conditioned on ¥). MMA differs from the original Metropolis algorithm 
algorithm in the way the candidate state ^ = (^i, • • • , ^d) is generated. Instead of using a 
(i-variate proposal PDF on M*^ to directly obtain the candidate state, in MMA a sequence 
of univariate proposal PDFs is used. Namely, each coordinate of the candidate state is 
generated separately using a univariate proposal distribution dependent on the coordinate 
6k of the current state. Then a check is made whether the ci-variate candidate generated 
in such a way belongs to the subset F in which case it is accepted as the next Markov chain 
state; otherwise it is rejected and the current MCMC sample is repeated. To summarize, 
the Modified Metropolis algorithm proceeds as follows: 



Modified Metropolis algorithm |2 



Input : 

t> 9^^^ G F, initial state of a Markov chain; 

> N, total number of states, i.e. samples; 

> vri(-), . . . , 7id{-), marginal PDFs of 6'i, . . . , 9d, respectively; 

> Si{-\a), . . . , Sd{-\a), univariate proposal PDFs depending on a parameter a G 
and satisfying the symmetry property Sk{f3\a) = Sk{a\f3), k = 1, . . . ,d. 

Algorithm: 

for i = 1, . . . , iV - 1 do 

% Generate a candidate state ^: 
for k = 1, . . . , d do 

Sample 4 ~ 5'fc(-|6'J.'^) 
Compute the acceptance ratio 



k) 



^k{9f) 

Accept or reject by setting 

with probability min{l,r}, 
9^^^ with probability 1 — min{l,r}. 
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end for 

Check whether ^ G F by system analysis 
and accept or reject ^ by setting 

end for 

Output : 

► 9^^\ . . . , 9^^\ N states of a Markov chain with stationary distribution 7r(-|F). 

Schematically, the Modified Metropolis algorithm is shown in Figure 1. For complete- 
ness and the reader's convenience, the proof that vr(-|F) is the stationary distribution for 
the Markov chain generated by MMA is given in the Appendix. 

Remark 2. The symmetry property Sk{f3\a) = Sk{c(\(3) does not play a critical role. 
If Sk does not satisfy this property, by replacing the "Metropolis" ratio in by the 
"Metropolis-Hastings" ratio 

we obtain an MCMC algorithm referred to as the Modified Metropolis-Hastings algorithm. 

Thus, if we run the Markov chain for sufficiently long (the burn- in period), starting 
from essentially any "seed" 9^^^ E F, then for large N the distribution of 9^^^ will be 
approximately 7r(-|F). Note, however, that in any practical application it is very difficult 
to check whether the Markov chain has reached its stationary distribution. If the seed 
^(1) ~ 7r(-|F), then all states 9^'^^ will be automatically distributed according to the target 
distribution, 6*^*^ ~ 7r(-|F), since it is the stationary distribution for the Markov chain. 
This is called perfect sampling [55] and Subset Simulation has this property because of 
the way the seeds are chosen [2]. 

Let us assume now that we are given a seed ~ 7r(-|Fj„i), where j = 2, . . . ,m. 
Then, using MMA, we can generate a Markov chain with N states starting from this 
seed and construct an estimate for pj similar to ([6]), where MCS samples are replaced by 
MCMC samples: 

1 ^ 

/>.-Pr^-^^ = ^E^^.(^?-i)' Ofl,^.{.\F,^^. (12) 

4 = 1 

Note that all samples • • • ; ^j-i in (fT2|) are identically distributed in the stationary 
state of the Markov chain, but are not independent. Nevertheless, these MCMC samples 
can be used for statistical averaging as if they were i.i.d., although with some reduction 
in efficiency [H]. Namely, the more correlated • • • , 9j^\ are, the less efficient is the 
estimate (IT^ . The correlation between successive samples is due to proposal PDFs Sk, 
which govern the generation of the next state of the Markov chain from the current one 
in MMA. Hence, the choice of proposal PDFs 5*^ controls the efficiency of estimate f|T2l) . 
making this choice very important. It was observed in |2] that the efficiency of MMA 
depends on the spread of proposal distributions, rather then on their type. Both small 
and large spreads tend to increase the dependence between successive samples, slowing 



6 



the convergence of the estimator. Large spreads may reduce the acceptance rate in fllOl) . 
increasing the number of repeated MCMC samples. Small spreads, on the contrary, may 
lead to a reasonably high acceptance rate, but still produce very correlated samples due 
to their close proximity. To find the optimal spread of proposal distributions for MMA is 
a non-trivial task which is discussed in Section [3l 

Remark 3. In [15] another modification of the Metropolis algorithm, called Modified 
Metropolis-Hastings algorithm with Delayed Rejection (MMHDR), has been proposed. 
The key idea behind MMHDR is to reduce the correlation between states of the Markov 
chain. A way to achieve this goal is the following: whenever a candidate ^ is rejected in 
( ITOj) . instead of getting a repeated sample = as the case of MMA, a new candi- 

date ^ is generated using a new set of proposal PDFs Sk- Of course, the acceptance ratios 
(IH]) for the second candidate have to be adjusted in order to keep the target distribution 
stationary. In general, MMHDR generates less correlated samples than MMA but it is 
computationally more expensive. 



2.3 Subset Simulation algorithm 

Subset Simulation uses the estimates ([6]) for pi and ( |T2l) for pj, j > 2, to obtain the 
estimate for the failure probability: 

m 

The remaining ingredient of Subset Simulation that we have to specify is the choice of 
intermediate failure domains Fi, . . . , Fm-i- Usually, performance of a dynamical system 
is described by a certain positive- valued performance function g : M.'^ M+, for instance, 
g{6) may represent some peak (maximum) response quantity when the system model 
is subjected to the uncertain excitation 6. Then the failure region, i.e. unacceptable 
performance region, can be defined as a set of excitations that lead to the exceedance of 
some prescribed critical threshold b: 

F = {eeR'^: g{e) > b}. (14) 

The sequence of intermediate failure domains can then be defined as 

F, = {eeR': g{9) > 6,}, (15) 

where < 6i < . . . < fcm-i < bm = b. Intermediate threshold values bj define the values of 
the conditional probabilities pj = P{Fj\Fj_i) and, therefore, affect the efficiency of Subset 
Simulation. In practical cases it is difficult to make a rational choice of the 6j-values in 
advance, so the bj are chosen adaptively (see (fT6!) below) so that the estimated conditional 
probabilities are equal to a fixed value po G (0, 1). We will refer to po as the conditional 
failure probability. 



Subset Simulation algorithm |[2] 



Input : 

> Po, conditional failure probability; 

> A^, number of samples per conditional level. 
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Algorithm: 

Set j = 0, number of conditional level 

Set Np{j) = 0, number of failure samples at level j 

Sample '~ ^(O 

for i = 1, . . . , N do 

if g(i) = g{ef) > 6 do 
NfU) ^ NfU) + 1 

end if 
end for 

while NF{j)/N < po do 
J ^ J + 1 

Sort {^W}: < < . . . < ^(»iv) 
Define 

fe. = ^ (16) 

for k = 1, . . . , Npo do 

Starting from 6^^^''' = Q^^-^-'^po+k) ^ generate 1/po states 

of a Markov chain . . . , ef^^°'^'^ ~ 7r(-|Fj), using MMA. 
end for 

Renumber: {Of''}^:^}/^ ^ ef\ Of^ ~ 7r(-|F,) 
for ? = 1, . . . , do 

if ^(0 = g{^3f) >bdO 

Nf{j) ^ Nf{j) + 1 
end if 
end for 
end while 
Output : 

► pp^, estimate of pp- 

Pf=f5^ (17) 



Schematically, the Subset Simulation algorithm is shown in Figure |2j 
The adaptive choice of 6j- values in f|T6|) guarantees, first, that all seeds Oj^^''^ are 
distributed according to 7r{-\Fj) and, second, that the estimated conditional probability 
P{Fj\Fj_i) is equal to Pq. Here, for convenience, po is assumed to be chosen such that NpQ 
and 1/po are positive integers, although this is not strictly necessary. In [2] it is suggested 
to use po = 0.1. The optimal choice of conditional failure probability is discussed in the 
next section. 

Remark 4. Subset Simulation provides an efficient stochastic simulation algorithm for 
computing failure probabilities for general reliability problems without using any specific 
information about the dynamic system other than an input-output model. This indepen- 
dence of a system's inherent properties makes Subset Simulation potentially useful for 
applications in different areas of science and engineering where the notion of "failure" 
has its own specific meaning, e.g. in Computational Finance to estimate the probability 
that a stock price will drop below a given threshold within a given period of time, in 
Computational Biology to estimate the probability of gene mutation, etc. 
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3 Tuning of the Modified Metropolis algorithm 



The efficiency and accuracy of Subset Simulation directly depends on tlie ergodic proper- 
ties of the Markov chain generated by the Modified Metropohs algorithm; in other words, 
on how fast the chain explores the parameter space and converges to its stationary distri- 
bution. The latter is determined by the choice of one-dimensional proposal distributions 
Sk, which makes this choice very important. In spite of this, the choice of proposal PDFs 
is still largely an art. It was observed in |2] that the efficiency of MMA is not sensitive to 
the type of the proposal PDFs; however, it depends on their spread (e.g. their variance). 

Optimal scaling refers to the need to tune the parameters of the algorithm to make 
the resulting Markov chain converge to stationarity as fast as possible. The issue of 
optimal scaling was recognized in the original paper by Metropolis et al [33]. Gelman, 
Roberts, and Gilks [17] were the first authors to obtain theoretical results on the optimal 
scaling of the original Metropolis algorithm. They proved that for optimal sampling from 
a high-dimensional Gaussian distribution, the Metropolis algorithm should be tuned to 
accept approximately 23% of the proposed moves only. Since then many papers have 
been published on optimal scaling of the original Metropolis algorithm. In this section, in 
the spirit of ^7\, we address the following question which is of high practical importance: 
what is the optimal variance of the univariate Gaussian proposal PDFs for simulating a 
high- dimensional Gaussian distribution conditional on some specific domain using MMA? 

This section is organized as follows: in Subsection 13. II we recall the original Metropolis 
algorithm and provide a brief overview of existing results on its optimal scaling; in Sub- 
section 13.21 we present a collection of observations on the optimal scaling of the Modified 
Metropolis algorithm for different numerical examples, and discuss the optimal scaling 
strategy for MMA when it is employed within Subset Simulation for estimating small 
failure probabilities. 

3.1 Metropolis algorithm: a brief history of its optimal scaling 

The Metropolis algorithm is the most popular class of MCMC algorithms. Let vr be the 
target PDF on M'^; 6''^"^ be the current state of the Markov chain; and S{-\9'^"-^) be a 
symmetric (i.e. S'(a|/3) = S{l3\a)) (i-variate proposal PDF depending on 9^"'\ Then the 
Metropolis update 9^"'^ of the Markov chain works as follows: first, simulate 

a candidate state ^ ~ S{-\9^'^^); next, compute the acceptance probability a{^\9^'^^) = 
min{l, 7r(^)/7r(^'^"^)}; and, finally, accept ^ as a next state of the Markov chain, i.e. set 
= with probability a{^\9^'^^) or reject ^ by setting = ^(") with the remaining 

probability l-a(^ I ^(")). It easy to prove that such updating leaves vr invariant, i.e. if 9^'^^ 
is distributed according to vr, then so is 9^^'^^\ Hence the chain will eventually converge 
to vr as its stationary distribution. Practically it means that if we run the Markov chain 
for a long time, starting from any 9^^^ G M'^, then for large N the distribution of 9^^^ will 
be approximately vr. 

The variance cx^ of the proposal PDF S turns out to have a significant impact on the 
speed of convergence of the Markov chain to its stationary distribution. Indeed, if is 
small, then the Markov chain explores its state space very slowly. On the other hand, if 
cr^ is large, the probability to accept a new candidate state is very low and this results 
in a chain remaining still for long periods of time. Since the Metropolis algorithm with 
extremal values of the variance of the proposal PDF produce a chain that explores its 
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state space slowly, it is natural to expect the existence of an optimal o"^ for which the 
convergence speed is maximized. 

The importance of optimal scaling was already realized in the landmark paper [33] 
where the Metropolis algorithm was first introduced. Metropolis et al developed an al- 
gorithm for generating samples from the Boltzmann distribution for solving numerical 
problems in statistical mechanics. In this work the uniform proposal PDF was used, 
5'(^|6'(")) = ?7[g(n)_„ 0(n)+Q,](O, and it was noted: 

"It may be mentioned in this connection that the maximum displacement a 
must be chosen with some care; if too large, most moves will be forbidden, 
and if too small, the configuration will not change enough. In either case it 
will then take longer to come to equilibrium." 

In [18] Hastings generalized the Metropolis algorithm. Namely, he showed that the pro- 
posal distribution need not be uniform, and it need not to be symmetric. In the latter case, 

the acceptance probability must be slightly modified: a[^\Q^'^^) = min |l, Jn(ly}o(J}(l)) }• 



The corresponding algorithm is called the Metropolis-Hastings algorithm. Furthermore, 
Hastings emphasized that the original sampling method has a general nature and can 
be applied in different circumstances (not only in the framework of statistical mechan- 
ics) and that Markov chain theory (which is absent in [33]) is a natural language for 
the algorithm. Among other insights, Hastings made the following useful yet difficult to 
implement recommendation: 

Choose a proposal distribution "so that the sample point in one step may 
move as large a distance as possible in the sample space, consistent with a low 
rejection rate." 

Historically, the tuning of the proposal's variance was usually performed by trial- 
and-error, typically using rules of thumb of the following form: select such that the 
corresponding acceptance rate, i.e. the average number of accepted candidate states, is 
between 30% and 70%. The rationale behind such rule is that too low an acceptance rate 
means that the Markov chain has many repeated samples, while too high an acceptance 
rate indicates that the chain moves very slowly. Although qualitatively correct, these 
rules suffered from the lack of theoretical justification for the lower and upper bounds for 
the acceptance rate. The first theoretical result on the optimal scaling of the Metropolis 
algorithm was obtained by Gelman et al [17] . It was proved that in order to perform 
optimally in high dimensional spaces, the algorithm should be tuned to accept as small 
as 23% of the proposed moves. This came as an unexpected and counter-intuitive result. 
Indeed, this states that the Markov chain should stay still about 77% of the time in order 
to have the fastest convergence speed. Let us formulate the main result more precisely. 

Suppose that all components of 6' G M*^ are i.i.d, i.e. the target distribution n{6) has the 
product form, n^O) = Yii=i fi^i)^ where the one-dimensional density / satisfies certain 
regularity conditions (namely, / is a C^-function and (log/)' is Lipschitz continuous). 
Then the optimal "random walk" Gaussian proposal PDF S'(^|6l(")) = J\f {^\9^''\ aHd) has 
the following properties: 

1. The optimal standard deviation is o" ~ 2A/\fdI, where / = E/[((log /)')^] = 




dx measures the "roughness" of /. The smoother the density is, the 
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smaller I is, and, therefore, the larger a is. In particular, for a one- dimensional case 
[d = 1) and standard Gaussian / (J = 1): o" ~ 2.4 (a surprisingly high value!); 

2. The acceptance rate of the corresponding Metropolis algorithm is approximately 
44% for d = 1 and declines to 23% when — )■ oo. Moreover, the asymptotic 
optimality of accepting 23% of proposed moves is approximately true for dimension 
as low as d = 6. 

This result gives rise to the following useful heuristic strategy, that is easy to imple- 
ment: tune the proposal variance so that the average acceptance rate is roughly 25%. 
In spite of the i.i.d. assumption for the target components, this result is believed to be 
robust and to hold under various perturbations of the target distribution. Being aware 
of practical difficulties of choosing the optimal a^, Gelman et al provided a very useful 
observation: 

"Interestingly, if one cannot be optimal, it seems better to use too high a value 
of a than too low." 

Remark 5. This observation is consistent with the numerical result obtained recently in 
[15] : an increased variance of the second stage proposal PDFs improves the performance 
of the MMHDR algorithm. 

Since the pioneering work [T7] , the problem of optimal scaling has attracted the atten- 
tion of many researchers and optimal scaling results have been derived for other types of 
MCMC algorithms. For instance, the Metropolis-adapted Langevin algorithm (MALA) 
was studied in [3^ and it was proved that the asymptotically optimal acceptance rate 
for MALA is approximately 57%. For a more detailed overview of existing results on the 
optimal scaling of the Metropolis algorithm see [Hj and references cited therein. 



3.2 Optimal scaling of the Modified Metropolis algorithm 

In this subsection we address two questions: what is the optimal variance cx^ of the 
univariate Gaussian proposal PDFs Sk{-\fi) = A/'(-|/i, a^), k = for simulating a 

high-dimensional conditional Gaussian distribution vr(-|F) = A/'(-|0, Id)/i?(-)/P(F) using 
the Modified Metropolis algorithm and what is the optimal scaling strategy for Modified 
Metropolis when it is employed within Subset Simulation for estimating small failure 
probabilities? 

Let us first define what we mean by "optimal" variance. Let ef_:^ be the the i^^ 
sample in the /c*^ Markov chain at simulation level j — 1- The conditional probability 
Pj = P{Fj\Fj-i) is then estimated as follows: 



N 

k=l i=l 



where A^^^ is the number of Markov chains and A^^ is the total number of samples simulated 
from each of these chains, Ng = N/Nc, so that the total number of Markov chain samples 
is A^. An expression for the coefficient of variation (c.o.v.) oi pj, derived in [2], is given 
by: 
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where 
and 

Bf = niF, {eflt)h^ {ef^'')\ - v] (21) 

is the autocovariance of the stationary stochastic process X{%) = Ip^{ef:l) at lag z. The 
term 1^(1 — pj)/Npj in ( !T9|) is the c.o.v. of the MCS estimator with independent 
samples. The c.o.v. of pj can thus be considered as the one in MCS with an effective 
number of independent samples iV/(l + 7^). The efficiency of the estimator using de- 
pendent MCMC samples (7^- > 0) is therefore reduced compared to the case when the 
samples are independent (7^ = 0). Hence, 7^ given by (1201) can be considered as a measure 
of correlation between the states of a Markov chain and smaller values of 7^- imply higher 
efficiency. 

Remark 6. Formula f|T9|) was derived assuming that the Markov chain generated according 
to MMA is ergodic and that the samples generated by different chains are uncorrelated 
through the indicator function, i.e. E[/ir. (6')/^?. (6'')] — = if ^ and 9' are from different 
chains. The latter, however, may not be always true, since the seeds for each chain may be 
dependent. Nevertheless, the expression in f[T9]) provides a useful theoretical description 
of the c.o.v. of f)j. 

Remark 7. The autocovariance sequence Rj'\ i = 0, . . . , Ng — 1, needed for calculation of 
7j, can be estimated using the Markov chain samples at the (j — 1)*^ level by: 

^ k=l i'=l 

Note that in general, 7^ depends on the number of samples Ng in the Markov chain, 
the conditional probability pj, the intermediate failure domains Fj_i and Fj, and the 
standard deviation aj of the proposal PDFs Sk{-\fi) = M{-\fi,o-j). According to the 
"basic" description of the Subset Simulation algorithm given in Section [2l pj = po for all 
j and Ng = 1/po- The latter, as it has been already mentioned, is not strictly necessary, 
yet convenient. In this subsection, the value po is chosen to be 0.1, as in the original paper 
[2]. In this setting, 7^ depends only on the standard deviation aj and the geometry of 
Fj-i and Fj. For a given reliability problem (i.e. for a given performance function g that 
defines domains Fj for all j), cr°^* is said to be the optimal spread of the proposal PDFs 
at level j, if it minimizes the value of 7^: 

(T°P* = arg mm 7^- {aj ) (23) 

We will refer to 7^ = 7j((7'j) as '-f- efficiency of the Modified Metropolis algorithm with 
proposal PDFs Af{-\fi, aj) at level j. 

Consider two examples of the sequence of intermediate failure domains. 



Example 1 (Exterior of a ball). Let 6 = re E M , where e is a unit vector and r = 
For many reasonable performance functions g, if r is large enough, then 6 E F = {6 E 
M.'^ : g{9) > b}, i.e. ^ is a failure point, regardless of e. Therefore, an exterior of a ball. 
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Br = {6 & : \\6\\ > r}, can serve as an idealized model of many failure domains. Define 
the intermediate failure domains as follows: 

Fj=Br^, (24) 

where the radii rj are chosen such that P{Fj\Fj-i) = po, i.e r| = F~2^{1 — Pq-'), where F^2 

denotes the cumulative distribution function (CDF) of the chi-square distribution with d 
degrees of freedom. The dimension d is chosen to be 10^. 

Example 2 (Linear case). Consider a linear reliability problem with performance function 
g{6) = aFO + 6, where a G M"^ and 6 G M are fixed coefficients. The corresponding 
intermediate failure domains Fj are half-spaces defined as follows: 

F, = {eeW':{e,ea)>P,}, (25) 

where = is the unit normal to the hyperplane specified by g, and the values of /3j 

are chosen such that P{Fj\Fj_i) = po, i.e (3j = $^^(1 — pg-'), where $ denotes the CDF 
of the standard normal distribution. The dimension d is chosen to be 10'^. 

For both examples, 7j as a function of aj is plotted in Fig. |3] and the approximate 
values of the optimal spread 0"°^* are given in Table 1 for simulation levels j = 1, . . . , 6. As 
expected, the optimal spread 0"°^* decreases when j increases, and based on the numerical 
values in Table 1, cr°''* seems to converge to approximately 0.3 and 0.4 in Example 1 and 
2, respectively. The following properties of the function 7^ = 7j(crj) are worth mentioning: 

(i) 7j increases very rapidly, when aj goes to zero; 

(ii) 7j has a deep trough around the optimal value cr°^^, when j is large (e.g., j > 4). 

Interestingly, these observations are consistent with the statement given in [17] and cited 
above: if one cannot be optimal (due to (ii), it is indeed difficult to achieve optimality), 
it is better to use too high a value of aj than too low. 

The question of interest now is what gain in efficiency can we achieve for a proper 
scaling of the Modified Metropolis algorithm when calculating small failure probabilities? 
We consider the following values of failure probability: pp = 10"'^, k = 2, . . . ,6. The c.o.v. 
of the failure probability estimates obtained by Subset Simulation are given in Fig. HJand 
Fig. E] for Examples 1 and 2, respectively. The dashed (solid) curves correspond to the 
case when = 300 (A^ = 1000) samples are used per each intermediate failure region. For 
estimation of each value of the failure probability, two different MMAs are used within SS: 
the optimal algorithm with aj = 0"°''* (marked with stars); and the reference algorithm 
with (Tj = 1 (marked with squares). The corresponding c.o.v's are denoted by 5opt and 61, 
respectively. From Fig. |4]and Fig. [5] it follows that the smaller pp, the more important 
to scale MMA optimally. When pp = 10~^, the optimal c.o.v 6opt is approximately 80% 
of the reference c.o.v. 61 for both examples, when A^ = 1000. 

Despite its obvious usefulness, the optimal scaling of the Modified Metropolis algo- 
rithm is difficult to achieve in practice. First, as shown in Table 1, the values of the 
optimal spread cr°^* are different for different reliability problems. Second, even for a 
given reliability problem, to find cr°^* is computationally expensive because of property 
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(ii) of 7j; and our simulation results show that the qualitative properties (i) and (ii) gen- 
erally hold for different reliability problems, not only for Examples 1 and 2. Therefore, 
we look for heuristic to choose aj that is easy to implement and yet gives near optimal 
behavior. 

It has been recognized for a long time that, when using an MCMC algorithm, it is 
useful to monitor its acceptance rate. Both 7-efficiency jj and the acceptance rate pj at 
level j depend on aj. For Examples 1 and 2, the approximate values of the acceptance rate 
that corresponds to the reference value aj = 1 and the optimal spread aj = a°'^^ are given 
in Table 2; 7^ as a function of pj is plotted in Fig. [6l for simulation levels j = 1, . . . , 6. 
A key observation is that, contrary to (ii), 7^ is very flat around the optimal acceptance 
rate p°^*, which is defined as the acceptance rate that corresponds to the optimal spread, 
i.e. p°''* = pj(cr°^*). Furthermore, according to our simulation results this behavior is 
typical, and not specific just for the considered examples. This observation gives rise to 
the following heuristic scaling strategy: 

At simulation level j > 1 select aj such that the corresponding acceptance rate 
Pj is between 30% and 50%. 

This strategy is easy to implement in the context of Subset Simulation. At each 
simulation level j, Markov chains are generated. Suppose, we do not know the optimal 
spread 0"°^* for our problem. We start with a reference value, say aj'" = 1, for the first 
n chains. Based only on these n chains, we calculate the corresponding acceptance rate 
pj'*^. If p^'^ is too low (i.e. it is smaller than 30%) we decrease the spread and use 
^n+i:2n ^ ^i.n next u chaius. If p]'" is too large (i.e. it is larger than 50%) we 

increase the spread and use > crj-" for the next n chains. We proceed like this 

until all Nf. Markov chains have been generated. Note that according to this procedure, 
CTj is kept constant within a single chain and it is changed only between chains. Hence 
the Markovian property is not destroyed. The described strategy guaranties that the 
corresponding scaling on the Modified Metropolis algorithm is nearly optimal. 

4 Optimal choice of conditional failure probability 

The parameter pq governs how many intermediate failure domains Fj are needed to reach 
the target failure domain F, which in turn affects the efficiency of Subset Simulation. A 
very small value of the conditional failure probability means that fewer intermediate levels 
are needed to reach F but it results in a very large number of samples N needed at each 
level for accurate estimation of the small conditional probabilities pj = P{Fj\Fj^i). In 
the extreme case when po < pr, Subset Simulation reduces to the standard Monte Carlo 
simulation. On the other hand, increasing the value of po will mean fewer samples are 
needed for accurate estimation at each level but it will increase the number of intermediate 
conditional levels m. In this section we provide a theoretical basis for the optimal value 
of the conditional failure probability. 

We wish to choose the value Pq such that the coefficient of variation (c.o.v) of the failure 
probability estimator pf,^ is as small as possible, for the same total number of samples. 
In [2], an analysis of the statistical properties of the Subset Simulation estimator is given. 
If the conditional failure domains are chosen so that the corresponding estimates of the 
conditional probabilities are all equal to po and the same number of samples is used in 
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the simulation at each conditional level, then the c.o.v. of the estimator pf,^ for a failure 
probability Pf = (requiring m conditional levels) is approximated by 

5^-^^(1+7), (26) 

where 7 is the average correlation factor over all levels (assumed to be insensitive to po) 
that reflects the correlation among the MCMC samples in each level and depends on the 
choice of the spread of the proposal PDFs. Since the total number of samples Nt = mN 
and the number of conditional levels m = logpp/ logpo? (|26|) can be rewritten as follows: 

Po(logpo)^ Nt 

Note that for given target failure probability pp and the total number of samples N^, the 
second factor in f l27|) does not depend on Pq. Thus, minimizing the first factor to minimize 
the c.o.v. 6 yields the optimal value as p'^^ ~ 0.2. Figure [7] shows the variation of 5 as a 
function of po according to (ET]) for pp = 10-^ Nt = 2000, and 7 = 0, 2, 4, 6, 8, 10. This 
figure indicates that S is relatively insensitive to po around its optimal value. Note that 
the shape of the trend is invariant with respect to pp, Nt and 7 because their effects 
are multiplicative. The figure shows that choosing 0.1 < Po ^ 0.3 will practically lead to 
similar efficiency and it is not necessary to fine tune the value of the conditional failure 
probability Pq as long as Subset Simulation is implemented properly. 



5 Bayesian Post-Processor for Subset Simulation 

In this section we develop a Bayesian post-processor SS+ for the original Subset Simula- 
tion algorithm described in Section [21 that provides more information about the value of 
Pf than a single point estimate. 

Recall that in SS the failure probability pp is represented as a product of conditional 
probabilities pj = P{Fj\Fj_i), each of which is estimated using for j = I and f lT2|) for 
j = 2, . . . , m. Let rij denote the number of samples • • • , Oj^\ that belong to subset 
Fj. The estimate for probability pj is then: 



Hi 



P. = ^ (28) 



and the estimate for the failure probability defined by ([T]) is: 



m m 



^^'=n^^.=ni- (29) 



=1 i=i 



In order to construct a Bayesian post-processor for SS, we have to replace the frequentist 
estimates ( 128|) in (129|) by their Bayesian analogs. In other words, we have to treat all 
Pi, . . . ,Pm and pp as stochastic variables and, following the Bayesian approach, proceed 
as follows: 

1. Specify prior PDFs p{pj) for all pj = P{Fj\Fj^i), j = 1, . . . , m; 
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2. Update each prior PDF, using new data "Pj-i = {^j-n • • • ; ^j^i ~ ^('l-^j-i)}) i-^- 
find the posterior PDFs p{pj\Vj_i) via Bayes' theorem; 

3. Obtain the posterior PDF p{pf\ U"!!^^ Vj) of pp = YVjLiPj from p{pi\T)o) , . . . , 

p{Pm\1^rn-l)- 

Remark 8. The term "stochastic variable" is used rather then "random variable" to em- 
phasize that it is a variable whose value is uncertain, not random, based on the limited 
information that we have available, and for which a probability model is chosen to de- 
scribe the relative plausibility of each of its possible values [TJ [21] . The failure probability 
Pf is a. constant given by ([1]) which lies in [0, 1] but its exact value is unknown because 
the integral cannot be evaluated exactly, and so we quantify the plausibility of its values 
based on the samples that probe the performance function g. 

To choose the prior distribution for each pj, we use the Principle of Maximum Entropy 
(PME), introduced by Jaynes [12]. The PME postulates that, subject to specified con- 
straints, the prior PDF p which should be taken to represent the prior state of knowledge 
is the one that gives the largest measure of uncertainty, i.e. maximizes Shannon's entropy 
which for a continuous variable is given by H{p) = — J^^p{x) log p{x)dx. Since the set 
of all possible values for each stochastic variable pj is the unit interval, we impose this as 
the only constraint for p{pj), i.e. supp p{pj) = [0, 1]. It is well known that the uniform 
distribution is the maximum entropy distribution among all continuous distributions on 
[0,1], so 

p(p,) = l, 0<p,<l. (30) 

Remark 9. We could choose a more informative prior PDF, perhaps based on previous 
experience with the failure probabilities for similar systems. If the amount of data is large 
(i.e. N is large), however, then the effect of the prior PDF on the posterior PDF will 
be negligible if the likelihood function has a unique global maximum. This phenomenon 
is usually referred to in the hterature as the "stability" or "robustness" of Bayesian 
estimation. 

Since initial samples 61^\...,6q^^ are i.i.d. according to vr, the sequence of zeros 
and ones, /^^(^o^'*), . . . , /^^(^q^'*), can be considered as Bernoulli trials and, therefore, the 
likelihood function p(T>o\pi) is a binomial distribution where Vq consists of the number 
of -Fi-failure samples ni = XlfcLi ^Fii^o'^) and the total number of samples is N. Hence, 
the posterior distribution of pi is the beta distribution Be{ni + 1, — ni + 1) (e.g. [T6] ) 
with parameters (ni -|- 1) and (A^ — ni + 1), i.e. 

which is actually the original Bayes' result [Hj. The beta function B in (131 p is a normal- 
izing constant. If j > 2, all MCMC samples dfli, ■ ■ ■ ? are distributed according to 
7r(-|Fj_i), however, they are not independent. Nevertheless, analogously to the frequen- 
tist case, where we used these samples for statistical averaging ( IT2|) as if they were i.i.d., 
we can use an expression similar to ( 13T|) as a good approximation for the posterior PDF 
piPjlVj^i) for j > 2, so: 
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where rij = Ylk=i -^Fji^f-i) number of F^-failure samples. Note that in Subset 

Simulation the MCMC samples O^^li, . . . , 6j^\ consist of the states of multiple Markov 
chains with different initial seeds obtained from previous conditional levels. This makes 
the approximation (1521) more accurate in comparison with the case of a single chain. 

Remark 10. It is important to highlight that using the r.h.s of ( 132|) as the posterior PDF 
p{pj\Vj^i) is equivalent to considering samples Ofli, • • • , O^^l as independent. This proba- 
bility model ignores information that the samples are generated by an MCMC algorithm, 
just as it ignores the fact that the random number generator used to generate these sam- 
ples is, in fact, a completely deterministic procedure. One corollary of this neglected 
information is that generally 5p(pp) < Spss, where is the c.o.v. of the posterior PDF 

of pf and 5^ss is the c.o.v. of the original SS estimator pp^ (see also the numerical ex- 
amples in Section [6]). Notice, however, that the two c.o.v.s are fundamentally different: 
^pipp) is defined based on samples generated from a single run of SS, while the frequentist 
c.o.v. 6ps^s is defined based on repeated runs of SS (an infinite number of them!). 

The last step is to find the PDF of the product of stochastic variables pp = YYjLiPjy 
given the distributions of all factors pj by ( l32l) . 

Remark 11. Products of random (or stochastic) variables play a central role in many 
different fields such as physics (interactive particle systems), number theory (asymptotic 
properties of arithmetical functions), statistics (asymptotic distributions of order statis- 
tics), etc. The theory of products of random variables is well covered in |15] . 

In general, to find the distribution of a product of stochastic variables is a non-trivial 
task. A well-known result is Rohatgi's formula |37]: if Xi and X2 are continuous stochastic 
variables with joint PDF fxi,X2i then the PDF oiY = X1X2 is 

fY{y)= j /xi.Xa U, - j — rfx. (33) 

This result is straightforward to derive but it is difficult to implement, especially when 
the number of stochastic variables is more than two. In the special case of a product of 
independent beta variables. Tang and Gupta [lO] derived an exact representation for the 
PDF and provided a recursive formula for computing the coefficients of this representation. 

Theorem 1 (Tang and Gupta, ^0]). Let Xi, . . . , X^ be independent beta variables, Xj ~ 
Be{aj, bj), and Y = X1X2 . . . Xm, then the probability density function ofY can be written 
as follows: 

fviy) = (fl ^^^^^^ 1 ^"'""'(1 - y)^^'"'^' • E^^Hi - yy, o<y<i, (34) 

where T is the gamma function and coefficients cr^™^ are defined by the following recurrence 
relation: 



^(fc) _ r(X]j=i bj + r) Jl^ (ofc + bk- afc„i)s ^(fc-i) 



<' = s.t:r: . : >: , ^""^ r-^r^ r=o,i,..., A: = 2,...,m, (35) 



with initial values 



a(^) = ^, aW = 0/orr>l. (36) 



Here, for any real number a G M, {a)s = «(« + 1) . . . (a + s — 1) = ^p"^-j^'' 
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We can obtain the posterior PDF of stochastic variable by applying this theorem 
to pf = rijliPj' where pj ~ Beijij + 1, N — rij + 1). 

Let p^^^{j)f\ ^^=0 T^j) denote the right-hand side of (!34|) with = rij + 1 and hj = 
N — Uj + 1, and pff^p be the maximum a posteriori (MAP) estimate, i.e. the mode of 

p^J+p = arg max UYJ^' V^). (37) 

PfG[0,1] 

Since the mode of a product of independent stochastic variables is equal to the product 
of the modes, and the mode of the beta variable X ~ Be{a, 6) is (a — l)/(a + 6 — 2), we 
have: 

m m 

pifXp = mode {p''^{pf\ Uf",' V,)) = J] mode (p(p,|P,_i)) = 11^= (^8) 

i=i i=i 

Thus, the original estimate pfi^ of failure probability obtained in the original Subset 
Simulation algorithm is just the MAP estimate P^map corresponding to the PDF p^^^ . 
This is a consequence of the choice of a uniform prior. 

Although ( 134|) provides an exact expression for the PDF of F, it contains an infinite 
sum that must be replaced by a truncated finite sum in actual computations. This 
means that in applications one has to use an approximation of the posterior PDF p^^^ 
based on (134|) . An alternative approach is to approximate the distribution of the product 
Y = YYj=i-^j ^ single beta variable Y ~ Be{a,b), where the parameters a and b are 
chosen so that E[Y] = E[Y] and E[f^'] is as close to EfF'^] as possible for 2 < k < K, 
for some fixed K. This idea was first proposed in [12]. In general, the product of beta 
variables does not follow the beta distribution, nevertheless, it was shown in [13] that the 
product can be accurately approximated by a beta variable even in the case of K = 2. 

Theorem 2 (Fan, [13]). Let Xi, . . . ,Xm be independent beta variables, Xj ~ Be{aj,bj), 
and Y = X1X2 . . . X^, then Y is approximately distributed as Y Be{a, b), where 

a = /ii 2' o=(l-Aii) 2' (39) 

and 

m m ( I T\ 

/^i = E[F] = n^Sr' l^2=nY'] = Y\7 "Ay (40) 

f},aj + b/ f},{aj+bj){aj+b, + l) 

It is easy to check that if F ~ Be{a, b) with a and b given by ( 1391) . then the first two 
moments of stochastic variables Y and Y coincide, i.e. = K[Y] and E[y^] = E[y^]. 

The accuracy of the approximation y~;Be(a, 6) is discussed in [T3] . 

Using Theorem 2, we can therefore approximate the posterior distribution p^^'^ of 
stochastic variable by the beta distribution as follows: 

p''+{Pf\ UfJo' l^j) - p''^{pf\ U"ro' 1^,) = Be{pF\a, b), i.e. pF^Be{a, b), (41) 



where 



rm n,+2 



nm nj+1 / -, _ -r-rm nj+2 \ f _ -r-rm nj+l \ _ TT' 

j=l N+2 \^ lli=l N+3 J 5 _ V / V 

j=l 7V+3 ~ 1 li=l 7V+2 1 li=l JV+3 1 li=l 7V+2 



(42) 
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Since the first two moments of p^^+ and p^^+ are equal (this also means the c.o.v. of 
pSS+ pss+ g^j^g equal), we have: 



m m , ^ 

Uj + 1 



^pSs+[pf\ =EpSs+[pF\ = Y[Ep^p^\v,^^)[pj] = Y[ 



N + 2' 

(43) 



i=i j-- 

Notice that 



^7 

A (Ar + 2)(iV + 3) 



lim Egss+fpiT'l = lim pp^, and Egss+fpirl ~ Pp"^, when is large (44) 



so, the mean value of the approximation p^^~^ to the posterior PDF p^^~^ is accurately 
approximated by the original estimate pp^ of the failure probability pp. 

Let us now summarize the Bayesian post-processor of Subset Simulation. From the 
algorithmic point of view, SS+ differs from SS only in the produced output. Instead of a 
single real number as an estimate of pi?, SS+ produces the posterior PDF p^^^{pp\ ^Y=o 
T>j) of the failure probability, which takes into account both prior information and the 
sampled data U^lT^Vj generated by SS, while its approximation p^^^{pF\^Y=o'^i^ more 
convenient for further computations. The posterior PDF p'^^^ and its approximation p^^~^ 
are given by (!34|) and f l4T|) . respectively, where 

r PoiV, ifj<m; 
' \Np, ifj=m, ^^^^ 

and m is the total number of intermediate levels in the run of the algorithm. The rela- 
tionship between SS and SS-|- is given by fl55]) and flH|) . Namely, the original estimate 
Pp^ of the failure probability based on the samples produced by the Subset Simulation 
algorithm coincides with the MAP estimate corresponding to p^^~^, and it also accurately 
approximates the mean of p^^~^. Also, the c.o.v. of p'^^^ and p^^~^ coincide and can be 
computed using the first two moments in (143|) . 

Remark 12. Note that to incorporate the uncertainty in the value of pp, one can use the 
full PDF p^^+ for life-cycle cost analyses, decision making under risk, and so on, rather 
than just using a point estimate of pp. For instance, a performance loss function £ often 
depends on the failure probability. In this case one can calculate an expected loss given 
by the following integral: 

E[C{pp)]= [' C{pF)p'''+{pp)dpp, (46) 
Jo 

which takes into account the uncertainty in the value of the failure probability. 

Remark 13. We note that a Bayesian post-processor for Monte Carlo evaluation of the 
integral ([1]), denoted as MC-I-, can be obtained as a special case of SS-I-. The posterior 
PDF for the failure probability pp based on A^ i.i.d. samples V = {9^^\ . . . ,9^^^} from 
7r(-) is given by 

p^''^+{pp\V) = Be{pF\n + 1, A^ - n + 1), (47) 
where n = XlfcLi If{9''^^) is the number of failure samples. 
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6 Illustrative Examples 



To demonstrate the Bayesian post-processor of Subset Simulation, we consider its ap- 
plication to two different reliability problems: a linear reliability problem and reliability 
analysis of an elasto-plastic structure subjected to strong seismic ground motion. 

6.1 Linear Reliability Problem 

As a first example, consider a linear failure domain. Let d = 10^ be the dimension of the 
linear problem and suppose pf = 10~^ is the exact failure probability. The failure domain 
F is defined as 

F = {9 eW^ : {9,e) > (3}, (48) 

where e is a unit vector and /3 = — pp) ^ 3.09 is the reliability index. This 

example is one where FORM UT] gives the exact failure probability in terms of /3. 
Note that 6* = /3e is the design point of the failure domain F [261 EZj. The failure 
probability estimate pp^ obtained by SS and the approximation of the posterior PDF 
pss+ obtained by SS-|- are given in Fig. |8] based on a number of samples = 10^ at each 
level (m = 3 levels were needed). Observe that p^^^ is quite narrowly focused (with the 
mean /ipss+ = 1.064 x 10~^ and the c.o.v. 6pss+ = 0.16) around pp^ = 1.057 x 10~^, 
which is very close to the exact value. Note that the frequentist c.o.v. of the original SS 
estimator pp^ is 6pss = 0.28 (based on 50 independent runs of the algorithm). 

6.2 Elasto-Plastic Structure Subjected to Ground Motion 

This example of a non-linear system is taken from [T]. Consider a structure that is 
modeled as a 2D six-story moment-resisting steel frame with two-node beam elements 
connecting the joints of the frame. The floors are assumed to be rigid in-plane and the 
joints are assumed to be rigid-plastic. The yield strength is assumed to be 317 MPa for 
all members. Under service load condition, the floors and the roof are subjected to a 
uniformly-distributed static span load of 24.7 kN/m and 13.2 kN/m, respectively. For the 
horizonal motion of the structure, masses are lumped at the floor levels, which include 
contributions from live loads and the dead loads from the floors and the frame members. 
The natural frequencies of the first two modes of vibration are computed to be 0.61 Hz 
and 1.71 Hz. Rayleigh damping is assumed so that the first two modes have 2% of critical 
damping. For a full description of the structure, see . 

The structure is subject to uncertain earthquake excitations modeled as a nonsta- 
tionary stochastic process. To simulate a time history of the ground motion acceleration 
for given moment magnitude M and epicentral distance r, a discrete-time white noise 
sequence Wj = ^/2^TJAtZj, j = 1, . . . , A'j is first generated, where At = 0.03 s is the sam- 
pling time, Nt = 1001 is the number of time instants (which corresponds to a duration of 
30 s), and Zi, . . . , Z^^ are i.i.d. standard Gaussian variables. The white noise sequence 
is then modulated (multiplied) by an envelope function e(t; M, r) at the discrete time 
instants. The discrete Fourier transform is then applied to the modulated white-noise 
sequence. The resulting spectrum is multiplied with a radiation spectrum A[f] M,r) [1], 
after which the discrete inverse Fourier transform is applied to transform the sequence 
back to time domain to yield a sample for the ground acceleration time history. The 
synthetic ground motion a{t; Z, M, r) generated from the model is thus a function of the 
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Gaussian vector Z = (Zi, . . . , ^'tvJ^ and stochastic excitation model parameters M and 
r. Here, M = 7 and r = 50 km are used. For more details about the ground motion 
sampling, refer to |T]. 

In this example, the uncertainty arises from seismic excitations and the uncertain 
parameters 9 = Z, the i.i.d. Gaussian sequence Zi, . . . , Z^t that generates the synthetic 
ground motion. The system response of interest, g{9), is defined to be the peak (absolute) 
interstory drift ratio 5max = maxj^i g 5j, where Si is the maximum absolute interstory 
drift ratio of the i"" story within the duration of study, 30 s. The failure domain F C M^* 
is defined as the exceedance of peak interstory drift ratio in any one of the stories within 
the duration of study. That is 

F = {0 G M^' : 5^ax(^) > b}, (49) 

where b is some prescribed critical threshold. In this example, b = 0.5% is considered, 
which, according to [13], corresponds to "operational" damage level. For this damage 
level, the structure may have a small amount of yielding. 

The failure probability is estimated to be equal to pp = 8.9 x 10"'^ (based on 4 x 10^ 
Monte Carlo samples). In the application of Subset Simulation, three different implemen- 
tation scenarios are considered: = 500, = 1000, and = 2000 samples are simulated 
at each conditional level. The failure probability estimates pp^ obtained by SS for these 
scenarios and the approximation of the corresponding posterior PDFs p^^~^ obtained by 
SS-|- are given in Fig. [9l Observe that the more samples used (i.e. the more information 
about the system that is extracted), the more narrowly p^^~^ is focused around pp^, as 
expected. The coefficients of variation are 6pss+ = 0.190, 6pss+ = 0.134, and 6pss+ = 0.095 
for = 500, A^ = 1000, and A^ = 2000, respectively. The corresponding frequentist coef- 
ficients of variation of the original SS estimator pp^ are 6pss = 0.303, 6pss = 0.201, and 
6pss = 0.131 (based on 50 independent runs of the algorithm). In SS+, the coefficient 
of variation 6pss+ can be considered as a measure of uncertainty, based on the generated 
samples. 



7 Conclusions 

This paper focuses on enhancements to the Subset Simulation (SS) method, an efficient 
algorithm for computing failure probabilities for general high-dimensional reliability prob- 
lems proposed by Au and Beck [2] . 

First, we explore MMA (Modified Metropolis algorithm), an MCMC technique em- 
ployed within SS. This exploration leads to the following nearly optimal scaling strategy 
for MMA: at simulation level j > 1, select aj such that the corresponding acceptance rate 
Pj is between 30% and 50%. 

Next, we provide a theoretical basis for the optimal value of the conditional failure 
probability pq. We demonstrate that choosing any po G [0.1,0.3] will lead to similar 
efficiency and it is not necessary to fine tune the value of the conditional failure probability 
as long as SS is implemented properly. 

Finally, a Bayesian extension 5* 5*+ of the original SS method is developed. In SS+, the 
uncertain failure probability pp that one is estimating is modeled as a stochastic variable 
whose possible values belong to the unit interval. Instead of a single real number as an 
estimate as in SS, SS+ produces the posterior PDF p^^~^{pp) of the failure probability. 
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which takes into account both prior information and the information in the samples gen- 
erated by SS. This PDF quantifies the uncertainty in the value oipp based on the samples 
and prior information and it may be used in risk analyses to incorporate this uncertainty, 
or its approximation p^^~^{pf), which is more convenient for further computations. The 
original SS estimate corresponds to the most probable value in the Bayesian approach. 
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Appendix 

In this Appendix we give a detailed proof that vr(-|F) is the stationary distribution for the 
Markov chain generated by the Modified Metropolis algorithm described in Section |2l 

Theorem 3 (Au and Beck, [2]). Let 9^^\9^'^\... be the Markov chain generated by the 
Modified Metropolis algorithm, then 7r(-|F) is a stationary distribution, i.e. if 9^'^^ is dis- 
tributed according to 7r(-|F), then so is 9^'''~^^\ 

Proof. Let K denote the transition kernel of the Markov chain generated by MMA. From 
the structure of the algorithm it follows that K has the following form: 

K{d9^'+^^\9^'^) = k{d9^'+^^\9^'^)+r{9^'^)6ei^){d9^'+'^), (50) 

where k describes the transitions from 9^^^ to 9^^~^^^ ^ 6"^*\ r(9^''^) = 1 — k(d9'\9^^^) is the 
probability of remaining at 6"-*^ and 5$ denotes point mass at 9 (Dirac measure). Note 
that if 9^^^^^^ 7^ 9^^\ then k{d9^^~^^'^\9^^'>) can be expressed as a product of the component 
transitional kernels: 

d 

k{d9^'+'^\9^'^) = llk,{d9f^'^\9f), (51) 
j=i 

where kj is the transitional kernel for the j^^ component of 9^''\ By definition of the 
algorithm 

k,{d9f^'^\9f) = 5,(^f ) min |l, ^^1^1 d9f^'^ + r,{9f)5^,,{dB^^^) (52) 

A sufficient condition for vr(-|F) to be a stationary distribution for K is to satisfy the 
so-called reversibility condition (also sometimes called the detailed balance equation): 

7r(ci^« |F)X(d^(^+i) |^«) = 7r(rf^('+^) |F)i^(ci^« 1^(^+1)) (53) 
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Indeed, given (1531) and 9^^^ ~ 7r(-|F), the distribution of 9^^~^^^ is 



V (54) 

= TT{d9^'+^^\¥) [ K{d9^'^\9^'+^^) 
Jw 

= 7i{d9^'+'^\¥), 

since K{d9^'^9^'+^'^) = 1. 

So, all we need to prove is that the transition kernel K of the MM algorithm satisfies 
the reversibility condition ( 153|) . Since all the Markov chain samples lie in F, it is sufficient 
to consider the transition only between states in F. Hence, without loss of generality, we 
assume that both 9^^^ and 9^'^^^^ belong to F. In addition, we assume that 9^^^ ^ 9^'^~^^\ 
since otherwise ( l53l) is trivial. It then follows from ( l50l) that in this case K{d9^^~^^^\9^^^) = 
k{d9^'+^^\9^'^). Taking into account (EH), we can rewrite the reversibility condition f l53|) in 
terms of components: 

f[r:,{9f%{d9f^'^\9f)d9f = f[n,{9f^'%{d9f\9f^'^)d9f^'\ (55) 

Therefore, it is enough to show that for all j = 1, . . . ,d 

7rj{9f%{d9f^'^\9f)d9f = 7rj{9f+'%{d9f\9f^'^)d9f^'\ (56) 

If 9f = 9f^^\ then §E> is trivial. Assume that 9f 7^ 9f'^^K Then, it follows from §^ 
that fc,(rf^f+'V?) = sM^'^\9'f^)mm (l, ^^^C^l d9f^^\ and dSSD reduces to 

^.(^;- )'S',(^j 1g]0mm|l, ^ ^^^^^ >=T^m Omm|l,-p^ 

^57) 

Since Sj is symmetric and 6min{l, a/h] = amin{l, b/a} for any positive numbers a and b, 
(!57|) is satisfied. This proves that vr(-|F) is a stationary distribution of the MMA Markov 
chain. □ 

Remark 14. A stationary distribution is unique and, therefore, is the limiting distribution 
for a Markov chain, if the chain is aperiodic and irreducible (see, for example, [H]). In 
the case of MMA, aperiodicity is guaranteed by the fact that the probability of having a 
repeated sample 9^^^^^^ = 6"^*^ is not zero. A Markov chain with stationary distribution p(-) 
is irreducible if, for any initial state, it has positive probability of entering any set to which 
p{-) assigns positive probability. It is clear that MMA with "standard" proposal distri- 
butions (e.g. Gaussian, uniform, log-normal, etc) generates irreducible Markov chains. 
In this case, 7r(-|F) is therefore the unique stationary distribution of the MMA Markov 
chain. 
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Simulation Level j 


1 


2 


3 


4 


5 


6 


Example 1, 0"°^* 


0.9 


0.7 


0.4 


0.3 


0.3 


0.3 


Example 2, 0"°^* 


1.1 


0.8 


0.6 


0.4 


0.4 


0.4 



Table 1: Approximate values of the optimal spread for different simulation levels 



Simulation Level j 


1 


2 


3 


4 


5 


6 


Example 1, Pj(l) 


49% 


30% 


20% 


13% 


9% 


6% 


Example 1, p^^^ 


51% 


39% 


47% 


50% 


44% 


40% 


Example 2, Pj{l) 


53% 


35% 


23% 


16% 


11% 


8% 


Example 2, 


52% 


41% 


40% 


49% 


43% 


37% 



Table 2: Approximate values of the acceptance rates Pj(l) and p°^^ that correspond to the reference 
value aj = 1 and the optimal spread Cj = '^'j^^ ^ respectively 
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Figure 1: Modified Metropolis algorithm 
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Fi gure 2: Subset Simulation algorithm: disks • and circles o represent samples from 7r(-|i^j_i) and 
7r(-|fj), respectively; circled disks are the Markov chain seeds for 7r(-|Fj) 
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Figure 6: The 7-efRciency of the Modified Metropohs algorithm as a function of the acceptance rate 
for simulation levels j = 1, . . . , 6 
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Fi gure 8: The failure probability estimate pf.^ obtained by SS and the approximation of the posterior 
PDF p^^+ obtained by SS+. The posterior PDF has mean fipss+ = 1.064 x 10^'^ and the c.o.v. Spss+ = 
0.16 
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Figure 9; The failure probability estimates pf,^ obtained by SS and the approximation of the posterior 
PDF p^^+ obtained by SS+ for three computational scenarios: N = 500, N = 1000, and N = 2000 
samples at each conditional level. 
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